Overview

Dataset statistics

Number of variables19
Number of observations8760687
Missing cells17521350
Missing cells (%)10.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 GiB
Average record size in memory152.0 B

Variable types

Categorical5
DateTime2
Numeric11
Boolean1

Alerts

congestion_surcharge has constant value "2.5" Constant
airport_fee has constant value "0.0" Constant
trip_distance is highly correlated with fare_amount and 1 other fieldsHigh correlation
payment_type is highly correlated with tip_amountHigh correlation
fare_amount is highly correlated with trip_distance and 1 other fieldsHigh correlation
tip_amount is highly correlated with payment_type and 1 other fieldsHigh correlation
total_amount is highly correlated with trip_distance and 2 other fieldsHigh correlation
fare_amount is highly correlated with tip_amount and 1 other fieldsHigh correlation
mta_tax is highly correlated with improvement_surchargeHigh correlation
tip_amount is highly correlated with fare_amount and 1 other fieldsHigh correlation
tolls_amount is highly correlated with total_amountHigh correlation
improvement_surcharge is highly correlated with mta_taxHigh correlation
total_amount is highly correlated with fare_amount and 2 other fieldsHigh correlation
trip_distance is highly correlated with fare_amount and 1 other fieldsHigh correlation
payment_type is highly correlated with tip_amountHigh correlation
fare_amount is highly correlated with trip_distance and 1 other fieldsHigh correlation
tip_amount is highly correlated with payment_typeHigh correlation
total_amount is highly correlated with trip_distance and 1 other fieldsHigh correlation
improvement_surcharge is highly correlated with airport_fee and 1 other fieldsHigh correlation
airport_fee is highly correlated with improvement_surcharge and 4 other fieldsHigh correlation
congestion_surcharge is highly correlated with improvement_surcharge and 4 other fieldsHigh correlation
payment_type is highly correlated with airport_fee and 1 other fieldsHigh correlation
store_and_fwd_flag is highly correlated with airport_fee and 1 other fieldsHigh correlation
VendorID is highly correlated with airport_fee and 1 other fieldsHigh correlation
fare_amount is highly correlated with extra and 2 other fieldsHigh correlation
extra is highly correlated with fare_amount and 2 other fieldsHigh correlation
mta_tax is highly correlated with fare_amount and 2 other fieldsHigh correlation
total_amount is highly correlated with fare_amount and 2 other fieldsHigh correlation
congestion_surcharge has 8760675 (> 99.9%) missing values Missing
airport_fee has 8760675 (> 99.9%) missing values Missing
trip_distance is highly skewed (γ1 = 2945.565755) Skewed
RatecodeID is highly skewed (γ1 = 131.785505) Skewed
fare_amount is highly skewed (γ1 = 61.39737964) Skewed
mta_tax is highly skewed (γ1 = 141.0589211) Skewed
tolls_amount is highly skewed (γ1 = 131.6560327) Skewed
total_amount is highly skewed (γ1 = 35.98422193) Skewed
extra has 4744786 (54.2%) zeros Zeros
tip_amount has 2895926 (33.1%) zeros Zeros
tolls_amount has 8335744 (95.1%) zeros Zeros

Reproduction

Analysis started2022-07-18 19:07:00.712163
Analysis finished2022-07-18 19:38:26.227214
Duration31 minutes and 25.52 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

VendorID
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.8 MiB
2
4914553 
1
3846134 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8760687
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%

Length

2022-07-18T16:38:26.661218image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T16:38:26.979212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%

Most occurring characters

ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8760687
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%

Most occurring scripts

ValueCountFrequency (%)
Common8760687
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII8760687
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24914553
56.1%
13846134
43.9%
Distinct2311532
Distinct (%)26.4%
Missing0
Missing (%)0.0%
Memory size66.8 MiB
Minimum2001-01-05 11:45:23
Maximum2018-07-27 04:06:37
2022-07-18T16:38:27.155216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:38:27.399212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct2315089
Distinct (%)26.4%
Missing0
Missing (%)0.0%
Memory size66.8 MiB
Minimum2001-01-05 11:52:05
Maximum2018-07-27 04:46:57
2022-07-18T16:38:27.622215image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:38:27.823216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.606807434
Minimum0
Maximum9
Zeros59269
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size66.8 MiB
2022-07-18T16:38:28.119780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.258420269
Coefficient of variation (CV)0.7831805118
Kurtosis4.110614297
Mean1.606807434
Median Absolute Deviation (MAD)0
Skewness2.236753458
Sum14076737
Variance1.583621573
MonotonicityNot monotonic
2022-07-18T16:38:28.239789image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
16249564
71.3%
21271678
 
14.5%
5414473
 
4.7%
3351928
 
4.0%
6250115
 
2.9%
4163572
 
1.9%
059269
 
0.7%
739
 
< 0.1%
925
 
< 0.1%
824
 
< 0.1%
ValueCountFrequency (%)
059269
 
0.7%
16249564
71.3%
21271678
 
14.5%
3351928
 
4.0%
4163572
 
1.9%
5414473
 
4.7%
6250115
 
2.9%
739
 
< 0.1%
824
 
< 0.1%
925
 
< 0.1%
ValueCountFrequency (%)
925
 
< 0.1%
824
 
< 0.1%
739
 
< 0.1%
6250115
 
2.9%
5414473
 
4.7%
4163572
 
1.9%
3351928
 
4.0%
21271678
 
14.5%
16249564
71.3%
059269
 
0.7%

trip_distance
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct4397
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.804022101
Minimum0
Maximum189483.84
Zeros55377
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size66.8 MiB
2022-07-18T16:38:28.405781image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.47
Q10.91
median1.55
Q32.84
95-th percentile10.31
Maximum189483.84
Range189483.84
Interquartile range (IQR)1.93

Descriptive statistics

Standard deviation64.12049747
Coefficient of variation (CV)22.86732955
Kurtosis8704373.201
Mean2.804022101
Median Absolute Deviation (MAD)0.75
Skewness2945.565755
Sum24565159.97
Variance4111.438195
MonotonicityNot monotonic
2022-07-18T16:38:28.627779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.8212156
 
2.4%
0.9210312
 
2.4%
0.7204209
 
2.3%
1203818
 
2.3%
1.1191421
 
2.2%
0.6184251
 
2.1%
1.2179719
 
2.1%
1.3167783
 
1.9%
1.4155003
 
1.8%
0.5150137
 
1.7%
Other values (4387)6901878
78.8%
ValueCountFrequency (%)
055377
0.6%
0.012660
 
< 0.1%
0.021963
 
< 0.1%
0.031656
 
< 0.1%
0.041356
 
< 0.1%
0.051153
 
< 0.1%
0.061063
 
< 0.1%
0.07991
 
< 0.1%
0.08925
 
< 0.1%
0.09894
 
< 0.1%
ValueCountFrequency (%)
189483.841
< 0.1%
830.81
< 0.1%
484.911
< 0.1%
267.71
< 0.1%
252.11
< 0.1%
132.611
< 0.1%
130.461
< 0.1%
128.731
< 0.1%
127.061
< 0.1%
123.31
< 0.1%

RatecodeID
Real number (ℝ≥0)

SKEWED

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0395453
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.8 MiB
2022-07-18T16:38:28.836778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4450619326
Coefficient of variation (CV)0.4281313497
Kurtosis28417.31093
Mean1.0395453
Median Absolute Deviation (MAD)0
Skewness131.785505
Sum9107131
Variance0.1980801239
MonotonicityNot monotonic
2022-07-18T16:38:28.983779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
18533295
97.4%
2179273
 
2.0%
527735
 
0.3%
315129
 
0.2%
45080
 
0.1%
99106
 
< 0.1%
669
 
< 0.1%
ValueCountFrequency (%)
18533295
97.4%
2179273
 
2.0%
315129
 
0.2%
45080
 
0.1%
527735
 
0.3%
669
 
< 0.1%
99106
 
< 0.1%
ValueCountFrequency (%)
99106
 
< 0.1%
669
 
< 0.1%
527735
 
0.3%
45080
 
0.1%
315129
 
0.2%
2179273
 
2.0%
18533295
97.4%

store_and_fwd_flag
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.4 MiB
False
8729886 
True
 
30801
ValueCountFrequency (%)
False8729886
99.6%
True30801
 
0.4%
2022-07-18T16:38:29.177777image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

PULocationID
Real number (ℝ≥0)

Distinct259
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean164.4579091
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.8 MiB
2022-07-18T16:38:29.381776image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1116
median162
Q3234
95-th percentile262
Maximum265
Range264
Interquartile range (IQR)118

Descriptive statistics

Standard deviation66.35990085
Coefficient of variation (CV)0.4035068987
Kurtosis-0.8874974706
Mean164.4579091
Median Absolute Deviation (MAD)67
Skewness-0.2792103357
Sum1440764266
Variance4403.63644
MonotonicityNot monotonic
2022-07-18T16:38:29.622781image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
237361012
 
4.1%
161354958
 
4.1%
236345712
 
3.9%
230309228
 
3.5%
162308339
 
3.5%
186292545
 
3.3%
234283990
 
3.2%
170277931
 
3.2%
48270624
 
3.1%
142264095
 
3.0%
Other values (249)5692253
65.0%
ValueCountFrequency (%)
1571
 
< 0.1%
24
 
< 0.1%
337
 
< 0.1%
419656
0.2%
52
 
< 0.1%
651
 
< 0.1%
714673
0.2%
871
 
< 0.1%
946
 
< 0.1%
102079
 
< 0.1%
ValueCountFrequency (%)
2653817
 
< 0.1%
264155094
1.8%
263170014
1.9%
262109309
1.2%
26147424
 
0.5%
2605308
 
0.1%
25963
 
< 0.1%
25898
 
< 0.1%
257256
 
< 0.1%
2566546
 
0.1%

DOLocationID
Real number (ℝ≥0)

Distinct261
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean162.7269672
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.8 MiB
2022-07-18T16:38:29.820784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1113
median162
Q3234
95-th percentile262
Maximum265
Range264
Interquartile range (IQR)121

Descriptive statistics

Standard deviation70.31144507
Coefficient of variation (CV)0.4320823173
Kurtosis-0.9421178723
Mean162.7269672
Median Absolute Deviation (MAD)68
Skewness-0.3373784234
Sum1425600026
Variance4943.699308
MonotonicityNot monotonic
2022-07-18T16:38:30.023782image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236360459
 
4.1%
161337453
 
3.9%
237320277
 
3.7%
170279155
 
3.2%
230270526
 
3.1%
162267091
 
3.0%
234245764
 
2.8%
142239840
 
2.7%
48234355
 
2.7%
239222967
 
2.5%
Other values (251)5982800
68.3%
ValueCountFrequency (%)
114044
 
0.2%
25
 
< 0.1%
3515
 
< 0.1%
442568
0.5%
555
 
< 0.1%
6167
 
< 0.1%
738223
0.4%
8130
 
< 0.1%
9450
 
< 0.1%
103086
 
< 0.1%
ValueCountFrequency (%)
26519413
 
0.2%
264141297
1.6%
263173597
2.0%
262118036
1.3%
26140265
 
0.5%
2609728
 
0.1%
259846
 
< 0.1%
2581400
 
< 0.1%
2574209
 
< 0.1%
25620113
 
0.2%

payment_type
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.8 MiB
1
6106416 
2
2599215 
3
 
43204
4
 
11852

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8760687
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

Length

2022-07-18T16:38:30.254787image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T16:38:30.421776image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

Most occurring characters

ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8760687
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common8760687
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8760687
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16106416
69.7%
22599215
29.7%
343204
 
0.5%
411852
 
0.1%

fare_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1714
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.244426
Minimum-450
Maximum8016
Zeros2238
Zeros (%)< 0.1%
Negative4260
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:30.646779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-450
5-th percentile4
Q16
median9
Q313.5
95-th percentile34
Maximum8016
Range8466
Interquartile range (IQR)7.5

Descriptive statistics

Standard deviation11.68320695
Coefficient of variation (CV)0.954165344
Kurtosis33432.27545
Mean12.244426
Median Absolute Deviation (MAD)3
Skewness61.39737964
Sum107269583.7
Variance136.4973245
MonotonicityNot monotonic
2022-07-18T16:38:30.861778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6473270
 
5.4%
5.5465207
 
5.3%
6.5461959
 
5.3%
7446414
 
5.1%
5433292
 
4.9%
7.5421895
 
4.8%
8399185
 
4.6%
8.5369251
 
4.2%
4.5359667
 
4.1%
9342286
 
3.9%
Other values (1704)4588261
52.4%
ValueCountFrequency (%)
-4501
< 0.1%
-4301
< 0.1%
-4201
< 0.1%
-2311
< 0.1%
-2071
< 0.1%
-2001
< 0.1%
-1982
< 0.1%
-1801
< 0.1%
-1601
< 0.1%
-1502
< 0.1%
ValueCountFrequency (%)
80161
< 0.1%
5724.51
< 0.1%
3013.51
< 0.1%
30061
< 0.1%
26931
< 0.1%
24091
< 0.1%
2309.51
< 0.1%
1018.51
< 0.1%
1016.51
< 0.1%
980.061
< 0.1%

extra
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct42
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3246882465
Minimum-44.69
Maximum60
Zeros4744786
Zeros (%)54.2%
Negative2106
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:31.033782image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-44.69
5-th percentile0
Q10
median0
Q30.5
95-th percentile1
Maximum60
Range104.69
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.4502554665
Coefficient of variation (CV)1.386731646
Kurtosis74.60443447
Mean0.3246882465
Median Absolute Deviation (MAD)0
Skewness3.361137354
Sum2844492.1
Variance0.2027299852
MonotonicityNot monotonic
2022-07-18T16:38:31.229783image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
04744786
54.2%
0.52549099
29.1%
11433472
 
16.4%
4.530512
 
0.3%
-0.51484
 
< 0.1%
-1572
 
< 0.1%
0.3380
 
< 0.1%
1.3178
 
< 0.1%
0.889
 
< 0.1%
-4.543
 
< 0.1%
Other values (32)72
 
< 0.1%
ValueCountFrequency (%)
-44.691
 
< 0.1%
-9.611
 
< 0.1%
-5.531
 
< 0.1%
-4.543
 
< 0.1%
-1572
 
< 0.1%
-0.51484
 
< 0.1%
-0.492
 
< 0.1%
-0.41
 
< 0.1%
-0.351
 
< 0.1%
04744786
54.2%
ValueCountFrequency (%)
601
 
< 0.1%
30.51
 
< 0.1%
251
 
< 0.1%
191
 
< 0.1%
181
 
< 0.1%
15.551
 
< 0.1%
10.51
 
< 0.1%
6.522
 
< 0.1%
4.88
< 0.1%
4.543
 
< 0.1%

mta_tax
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4975066111
Minimum-0.5
Maximum45.49
Zeros35938
Zeros (%)0.4%
Negative4153
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:31.378779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.5
5-th percentile0.5
Q10.5
median0.5
Q30.5
95-th percentile0.5
Maximum45.49
Range45.99
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.04333281101
Coefficient of variation (CV)0.08709997022
Kurtosis144286.3777
Mean0.4975066111
Median Absolute Deviation (MAD)0
Skewness141.0589211
Sum4358499.7
Variance0.00187773251
MonotonicityNot monotonic
2022-07-18T16:38:31.561781image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0.58720505
99.5%
035938
 
0.4%
-0.54153
 
< 0.1%
373
 
< 0.1%
0.015
 
< 0.1%
0.354
 
< 0.1%
0.91
 
< 0.1%
0.41
 
< 0.1%
6.331
 
< 0.1%
151
 
< 0.1%
Other values (5)5
 
< 0.1%
ValueCountFrequency (%)
-0.54153
 
< 0.1%
035938
 
0.4%
0.015
 
< 0.1%
0.321
 
< 0.1%
0.354
 
< 0.1%
0.41
 
< 0.1%
0.58720505
99.5%
0.61
 
< 0.1%
0.91
 
< 0.1%
373
 
< 0.1%
ValueCountFrequency (%)
45.491
 
< 0.1%
23.81
 
< 0.1%
151
 
< 0.1%
10.411
 
< 0.1%
6.331
 
< 0.1%
373
 
< 0.1%
0.91
 
< 0.1%
0.61
 
< 0.1%
0.58720505
99.5%
0.41
 
< 0.1%

tip_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3397
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.818759305
Minimum-88.8
Maximum441.71
Zeros2895926
Zeros (%)33.1%
Negative55
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:31.716781image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-88.8
5-th percentile0
Q10
median1.36
Q32.35
95-th percentile6.05
Maximum441.71
Range530.51
Interquartile range (IQR)2.35

Descriptive statistics

Standard deviation2.486374703
Coefficient of variation (CV)1.36707188
Kurtosis599.1830975
Mean1.818759305
Median Absolute Deviation (MAD)1.36
Skewness8.70551644
Sum15933581
Variance6.182059164
MonotonicityNot monotonic
2022-07-18T16:38:31.931785image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02895926
33.1%
1529907
 
6.0%
2271504
 
3.1%
1.5108003
 
1.2%
1.4696331
 
1.1%
1.5695907
 
1.1%
1.3694515
 
1.1%
1.6693161
 
1.1%
1.4590280
 
1.0%
1.7689817
 
1.0%
Other values (3387)4395336
50.2%
ValueCountFrequency (%)
-88.81
< 0.1%
-13.061
< 0.1%
-111
< 0.1%
-101
< 0.1%
-7.921
< 0.1%
-3.321
< 0.1%
-31
< 0.1%
-2.31
< 0.1%
-22
< 0.1%
-1.661
< 0.1%
ValueCountFrequency (%)
441.711
< 0.1%
4151
< 0.1%
4111
< 0.1%
3551
< 0.1%
3301
< 0.1%
3101
< 0.1%
2602
< 0.1%
2501
< 0.1%
2301
< 0.1%
2202
< 0.1%

tolls_amount
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct967
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3026157184
Minimum-15
Maximum950.7
Zeros8335744
Zeros (%)95.1%
Negative25
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:32.136780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-15
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum950.7
Range965.7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.738183755
Coefficient of variation (CV)5.74386474
Kurtosis55788.09396
Mean0.3026157184
Median Absolute Deviation (MAD)0
Skewness131.6560327
Sum2651121.59
Variance3.021282764
MonotonicityNot monotonic
2022-07-18T16:38:32.337784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
08335744
95.1%
5.76391626
 
4.5%
10.56018
 
0.1%
2.643870
 
< 0.1%
12.53816
 
< 0.1%
5.542557
 
< 0.1%
11.522083
 
< 0.1%
17.5848
 
< 0.1%
15.5780
 
< 0.1%
16.26620
 
< 0.1%
Other values (957)12725
 
0.1%
ValueCountFrequency (%)
-151
 
< 0.1%
-12.53
 
< 0.1%
-11.521
 
< 0.1%
-10.54
 
< 0.1%
-5.7615
 
< 0.1%
-41
 
< 0.1%
08335744
95.1%
0.0110
 
< 0.1%
0.021
 
< 0.1%
0.031
 
< 0.1%
ValueCountFrequency (%)
950.71
 
< 0.1%
910.91
 
< 0.1%
910.51
 
< 0.1%
8211
 
< 0.1%
795.761
 
< 0.1%
765.761
 
< 0.1%
6001
 
< 0.1%
5763
< 0.1%
567.561
 
< 0.1%
556.331
 
< 0.1%

improvement_surcharge
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.8 MiB
0.3
8753737 
-0.3
 
4255
0.0
 
2569
1.0
 
126

Length

Max length4
Median length3
Mean length3.000485693
Min length3

Characters and Unicode

Total characters26286316
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.3
2nd row0.3
3rd row0.3
4th row0.3
5th row0.3

Common Values

ValueCountFrequency (%)
0.38753737
99.9%
-0.34255
 
< 0.1%
0.02569
 
< 0.1%
1.0126
 
< 0.1%

Length

2022-07-18T16:38:32.503788image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T16:38:32.657779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0.38757992
> 99.9%
0.02569
 
< 0.1%
1.0126
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
08763256
33.3%
.8760687
33.3%
38757992
33.3%
-4255
 
< 0.1%
1126
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number17521374
66.7%
Other Punctuation8760687
33.3%
Dash Punctuation4255
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08763256
50.0%
38757992
50.0%
1126
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.8760687
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4255
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common26286316
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08763256
33.3%
.8760687
33.3%
38757992
33.3%
-4255
 
< 0.1%
1126
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII26286316
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08763256
33.3%
.8760687
33.3%
38757992
33.3%
-4255
 
< 0.1%
1126
 
< 0.1%

total_amount
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct11514
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.49108833
Minimum-450.3
Maximum8016.8
Zeros1066
Zeros (%)< 0.1%
Negative4260
Negative (%)< 0.1%
Memory size66.8 MiB
2022-07-18T16:38:32.858778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-450.3
5-th percentile5.8
Q18.3
median11.3
Q316.62
95-th percentile43.95
Maximum8016.8
Range8467.1
Interquartile range (IQR)8.32

Descriptive statistics

Standard deviation14.19545671
Coefficient of variation (CV)0.9163627758
Kurtosis15348.25133
Mean15.49108833
Median Absolute Deviation (MAD)3.5
Skewness35.98422193
Sum135712576.2
Variance201.5109911
MonotonicityNot monotonic
2022-07-18T16:38:33.070779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.3208335
 
2.4%
7.8206427
 
2.4%
6.8203105
 
2.3%
8.3199717
 
2.3%
8.8195497
 
2.2%
6.3185976
 
2.1%
9.3173507
 
2.0%
5.8162675
 
1.9%
9.8160903
 
1.8%
10.3147321
 
1.7%
Other values (11504)6917224
79.0%
ValueCountFrequency (%)
-450.31
< 0.1%
-430.31
< 0.1%
-420.81
< 0.1%
-231.81
< 0.1%
-209.31
< 0.1%
-207.31
< 0.1%
-200.31
< 0.1%
-198.51
< 0.1%
-180.81
< 0.1%
-160.81
< 0.1%
ValueCountFrequency (%)
8016.81
< 0.1%
5726.31
< 0.1%
3014.31
< 0.1%
3006.81
< 0.1%
2694.81
< 0.1%
2417.811
< 0.1%
2310.31
< 0.1%
1019.31
< 0.1%
1017.31
< 0.1%
1003.51
< 0.1%

congestion_surcharge
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)8.3%
Missing8760675
Missing (%)> 99.9%
Memory size66.8 MiB
2.5
12 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters36
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.5
2nd row2.5
3rd row2.5
4th row2.5
5th row2.5

Common Values

ValueCountFrequency (%)
2.512
 
< 0.1%
(Missing)8760675
> 99.9%

Length

2022-07-18T16:38:33.236780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T16:38:33.374778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2.512
100.0%

Most occurring characters

ValueCountFrequency (%)
212
33.3%
.12
33.3%
512
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number24
66.7%
Other Punctuation12
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
212
50.0%
512
50.0%
Other Punctuation
ValueCountFrequency (%)
.12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common36
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
212
33.3%
.12
33.3%
512
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII36
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
212
33.3%
.12
33.3%
512
33.3%

airport_fee
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)8.3%
Missing8760675
Missing (%)> 99.9%
Memory size66.8 MiB
0.0
12 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters36
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.012
 
< 0.1%
(Missing)8760675
> 99.9%

Length

2022-07-18T16:38:33.505780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T16:38:33.642780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0.012
100.0%

Most occurring characters

ValueCountFrequency (%)
024
66.7%
.12
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number24
66.7%
Other Punctuation12
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
024
100.0%
Other Punctuation
ValueCountFrequency (%)
.12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common36
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
024
66.7%
.12
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII36
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
024
66.7%
.12
33.3%

Interactions

2022-07-18T16:36:23.717127image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:26:48.500348image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:59.469075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:11.985851image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:02.181886image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:59.775854image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:55.173922image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:45.911071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:38.574073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:31.537071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:27.766736image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:28.654125image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:26:55.098013image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:05.757586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:16.340851image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:07.996885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:04.648918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:59.668918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:50.664073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:43.061074image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:36.809703image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:32.980855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:33.514124image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:01.444577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:11.728592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:20.356855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:12.878999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:09.686918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:04.219924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:55.636070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:47.600075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:41.821701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:37.963866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:39.043124image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:07.817108image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:17.727589image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:24.714869image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:17.710652image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:14.731915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:08.873916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:00.963066image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:52.360075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:46.999328image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:43.241447image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:43.893124image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:14.221102image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:24.011108image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:29.349860image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:22.678651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:19.541917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:13.530374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:05.700072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:57.263070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:52.169772image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:48.448442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:48.562659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:20.474634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:29.907659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:33.882850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:27.711652image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:24.503916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:17.998071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:10.319067image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:02.194068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:57.250220image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:53.443962image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:53.217641image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:27.298632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:36.339659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:38.802851image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:32.869651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:29.662916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:22.704073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:15.439065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:07.064071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:02.306209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:58.305959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:57.949999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:34.101630image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:42.233664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:43.427853image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:38.151305image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:34.645916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:27.315066image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:20.089069image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:11.599071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:07.691733image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:03.287959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:37:02.860008image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:40.571684image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:48.627657image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:48.055888image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:43.825270image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:39.868917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:32.006070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:24.722067image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:16.533066image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:12.934730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:08.378959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:37:07.394007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:47.053221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:28:56.185984image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:52.505889image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:49.479861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:44.917921image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:36.672065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:29.267067image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:21.353068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:17.756732image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:13.397258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:37:11.941010image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:27:53.403517image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:03.276497image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:29:57.026887image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:30:54.554852image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:31:50.183917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:32:41.171072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:33:33.905072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:34:26.527074image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:35:22.808731image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-18T16:36:18.784866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-07-18T16:38:33.802779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-18T16:38:34.132777image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-18T16:38:34.550778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-18T16:38:34.846781image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-18T16:38:35.042779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-18T16:37:15.071006image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-18T16:37:28.507000image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-18T16:38:05.487142image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-18T16:38:10.379144image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_fee
012018-01-01 00:21:052018-01-01 00:24:2310.51N412424.50.50.50.000.00.35.80NaNNaN
112018-01-01 00:44:552018-01-01 01:03:0512.71N239140214.00.50.50.000.00.315.30NaNNaN
212018-01-01 00:08:262018-01-01 00:14:2120.81N26214116.00.50.51.000.00.38.30NaNNaN
312018-01-01 00:20:222018-01-01 00:52:51110.21N140257233.50.50.50.000.00.334.80NaNNaN
412018-01-01 00:09:182018-01-01 00:27:0622.51N246239112.50.50.52.750.00.316.55NaNNaN
512018-01-01 00:29:292018-01-01 00:32:4830.51N14314324.50.50.50.000.00.35.80NaNNaN
612018-01-01 00:38:082018-01-01 00:48:2421.71N5023919.00.50.52.050.00.312.35NaNNaN
712018-01-01 00:49:292018-01-01 00:51:5310.71N23923814.00.50.51.000.00.36.30NaNNaN
812018-01-01 00:56:382018-01-01 01:01:0511.01N2382415.50.50.51.700.00.38.50NaNNaN
912018-01-01 00:17:042018-01-01 00:22:2410.71N17017025.50.50.50.000.00.36.80NaNNaN

Last rows

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeairport_fee
876067712018-01-31 23:05:112018-01-31 23:13:5312.601N137148210.00.50.50.000.00.311.30NaNNaN
876067812018-01-31 23:41:112018-01-31 23:45:2210.301N16323024.50.50.50.000.00.35.80NaNNaN
876067912018-01-31 23:52:532018-01-31 23:57:3211.201N4316216.00.50.51.500.00.38.80NaNNaN
876068022018-01-31 23:20:512018-01-31 23:30:5111.651N23415818.50.50.52.450.00.312.25NaNNaN
876068122018-01-31 23:57:332018-02-01 00:07:4412.921N230238211.00.50.50.000.00.312.30NaNNaN
876068212018-01-31 23:21:352018-01-31 23:34:2022.801N158163112.00.50.52.650.00.315.95NaNNaN
876068312018-01-31 23:35:512018-01-31 23:38:5710.601N16316214.50.50.51.150.00.36.95NaNNaN
876068422018-01-31 23:28:002018-01-31 23:37:0912.951N7469210.50.50.50.000.00.311.80NaNNaN
876068522018-01-31 23:24:402018-01-31 23:25:2810.001N719320.00.00.00.000.00.00.00NaNNaN
876068622018-01-31 23:28:162018-01-31 23:28:3810.001N719320.00.00.00.000.00.00.00NaNNaN